13 research outputs found

    Hardness Amplification of Optimization Problems

    Get PDF
    In this paper, we prove a general hardness amplification scheme for optimization problems based on the technique of direct products. We say that an optimization problem ? is direct product feasible if it is possible to efficiently aggregate any k instances of ? and form one large instance of ? such that given an optimal feasible solution to the larger instance, we can efficiently find optimal feasible solutions to all the k smaller instances. Given a direct product feasible optimization problem ?, our hardness amplification theorem may be informally stated as follows: If there is a distribution D over instances of ? of size n such that every randomized algorithm running in time t(n) fails to solve ? on 1/?(n) fraction of inputs sampled from D, then, assuming some relationships on ?(n) and t(n), there is a distribution D\u27 over instances of ? of size O(n??(n)) such that every randomized algorithm running in time t(n)/poly(?(n)) fails to solve ? on 99/100 fraction of inputs sampled from D\u27. As a consequence of the above theorem, we show hardness amplification of problems in various classes such as NP-hard problems like Max-Clique, Knapsack, and Max-SAT, problems in P such as Longest Common Subsequence, Edit Distance, Matrix Multiplication, and even problems in TFNP such as Factoring and computing Nash equilibrium

    Towards a General Direct Product Testing Theorem

    Get PDF
    The Direct Product encoding of a string a in {0,1}^n on an underlying domain V subseteq ([n] choose k), is a function DP_V(a) which gets as input a set S in V and outputs a restricted to S. In the Direct Product Testing Problem, we are given a function F:V -> {0,1}^k, and our goal is to test whether F is close to a direct product encoding, i.e., whether there exists some a in {0,1}^n such that on most sets S, we have F(S)=DP_V(a)(S). A natural test is as follows: select a pair (S,S\u27)in V according to some underlying distribution over V x V, query F on this pair, and check for consistency on their intersection. Note that the above distribution may be viewed as a weighted graph over the vertex set V and is referred to as a test graph. The testability of direct products was studied over various domains and test graphs: Dinur and Steurer (CCC \u2714) analyzed it when V equals the k-th slice of the Boolean hypercube and the test graph is a member of the Johnson graph family. Dinur and Kaufman (FOCS \u2717) analyzed it for the case where V is the set of faces of a Ramanujan complex, where in this case V=O_k(n). In this paper, we study the testability of direct products in a general setting, addressing the question: what properties of the domain and the test graph allow one to prove a direct product testing theorem? Towards this goal we introduce the notion of coordinate expansion of a test graph. Roughly speaking a test graph is a coordinate expander if it has global and local expansion, and has certain nice intersection properties on sampling. We show that whenever the test graph has coordinate expansion then it admits a direct product testing theorem. Additionally, for every k and n we provide a direct product domain V subseteq (n choose k) of size n, called the Sliding Window domain for which we prove direct product testability

    Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

    Full text link
    Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within approximation factor poly(logn)\text{poly}(\log n). In this paper, we provide an algorithm with running time O~(n22/7)\tilde{O}(n^{2-2/7}) that approximates the edit distance within a constant factor

    An Algorithmic Bridge Between Hamming and Levenshtein Distances

    Get PDF
    The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost 1/a1/a for a parameter a1a\ge 1. This basic variant, denoted EDaED_a, bridges classical edit distance (a=1a=1) with Hamming distance (aa\to\infty), leading to interesting algorithmic challenges: Does the time complexity of computing EDaED_a interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating EDaED_a? We first present a simple deterministic exact algorithm for EDaED_a and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a (1+ϵ)(1+\epsilon)-approximation of EDa(X,Y)ED_a(X,Y), given strings X,YX,Y of total length nn and a bound kEDa(X,Y)k\ge ED_a(X,Y). For simplicity, let us focus on k1k\ge 1 and a constant ϵ>0\epsilon > 0; then, our algorithm takes O~(n/a+ak3)\tilde{O}(n/a + ak^3) time. Unless a=O~(1)a=\tilde{O}(1) and for small enough kk, this running time is sublinear in nn. We also consider a very natural version that asks to find a (kI,kS)(k_I, k_S)-alignment -- an alignment with at most kIk_I indels and kSk_S substitutions. In this setting, we give an exact algorithm and, more importantly, an O~(nkI/kS+kSkI3)\tilde{O}(nk_I/k_S + k_S\cdot k_I^3)-time (1,1+ϵ)(1,1+\epsilon)-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for EDaED_a for a=Θ(kS/kI)a=\Theta(k_S / k_I). These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving (1+ϵ)(1+\epsilon)-approximation in sublinear time, even for a favorable choice of kk.Comment: The full version of a paper accepted to ITCS 2023; abstract shortened to meet arXiv requirement

    Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal

    Full text link
    We study the problem of approximating edit distance in sublinear time. This is formalized as a promise problem (k,kc)(k,k^c)-Gap Edit Distance, where the input is a pair of strings X,YX,Y and parameters k,c>1k,c>1, and the goal is to return YES if ED(X,Y)kED(X,Y)\leq k and NO if ED(X,Y)>kcED(X,Y)> k^c. Recent years have witnessed significant interest in designing sublinear-time algorithms for Gap Edit Distance. We resolve the non-adaptive query complexity of Gap Edit Distance, improving over several previous results. Specifically, we design a non-adaptive algorithm with query complexity O~(nkc0.5)\tilde{O}(\frac{n}{k^{c-0.5}}), and further prove that this bound is optimal up to polylogarithmic factors. Our algorithm also achieves optimal time complexity O~(nkc0.5)\tilde{O}(\frac{n}{k^{c-0.5}}) whenever c1.5c\geq 1.5. For 1<c<1.51<c<1.5, the running time of our algorithm is O~(nk2c1)\tilde{O}(\frac{n}{k^{2c-1}}). For the restricted case of kc=Ω(n)k^c=\Omega(n), this matches a known result [Batu, Erg\"un, Kilian, Magen, Raskhodnikova, Rubinfeld, and Sami, STOC 2003], and in all other (nontrivial) cases, our running time is strictly better than all previous algorithms, including the adaptive ones

    An Algorithmic Bridge Between Hamming and Levenshtein Distances

    Get PDF

    Can You Solve Closest String Faster than Exhaustive Search?

    Full text link
    We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set XΣdX \subseteq \Sigma^d of nn strings, find the string xx^* minimizing the radius of the smallest Hamming ball around xx^* that encloses all the strings in XX. In this paper, we investigate whether the Closest String problem admits algorithms that are faster than the trivial exhaustive search algorithm. We obtain the following results for the two natural versions of the problem: \bullet In the continuous Closest String problem, the goal is to find the solution string xx^* anywhere in Σd\Sigma^d. For binary strings, the exhaustive search algorithm runs in time O(2dpoly(nd))O(2^d poly(nd)) and we prove that it cannot be improved to time O(2(1ϵ)dpoly(nd))O(2^{(1-\epsilon) d} poly(nd)), for any ϵ>0\epsilon > 0, unless the Strong Exponential Time Hypothesis fails. \bullet In the discrete Closest String problem, xx^* is required to be in the input set XX. While this problem is clearly in polynomial time, its fine-grained complexity has been pinpointed to be quadratic time n2±o(1)n^{2 \pm o(1)} whenever the dimension is ω(logn)<d<no(1)\omega(\log n) < d < n^{o(1)}. We complement this known hardness result with new algorithms, proving essentially that whenever dd falls out of this hard range, the discrete Closest String problem can be solved faster than exhaustive search. In the small-dd regime, our algorithm is based on a novel application of the inclusion-exclusion principle. Interestingly, all of our results apply (and some are even stronger) to the natural dual of the Closest String problem, called the Remotest String problem, where the task is to find a string maximizing the Hamming distance to all the strings in XX

    Can You Solve Closest String Faster Than Exhaustive Search?

    Get PDF
    corecore